batch statistics
Buffer layers for Test-Time Adaptation
Kim, Hyeongyu, Han, Geonhui, Hwang, Dosik
In recent advancements in Test Time Adaptation (TTA), most existing methodologies focus on updating normalization layers to adapt to the test domain. However, the reliance on normalization-based adaptation presents key challenges. First, normalization layers such as Batch Normalization (BN) are highly sensitive to small batch sizes, leading to unstable and inaccurate statistics. Moreover, normalization-based adaptation is inherently constrained by the structure of the pre-trained model, as it relies on training-time statistics that may not generalize well to unseen domains. These issues limit the effectiveness of normalization-based TTA approaches, especially under significant domain shift. In this paper, we introduce a novel paradigm based on the concept of a Buffer layer, which addresses the fundamental limitations of normalization layer updates. Unlike existing methods that modify the core parameters of the model, our approach preserves the integrity of the pre-trained backbone, inherently mitigating the risk of catastrophic forgetting during online adaptation. Through comprehensive experimentation, we demonstrate that our approach not only outperforms traditional methods in mitigating domain shift and enhancing model robustness, but also exhibits strong resilience to forgetting. Furthermore, our Buffer layer is modular and can be seamlessly integrated into nearly all existing TTA frameworks, resulting in consistent performance improvements across various architectures. These findings validate the effectiveness and versatility of the proposed solution in real-world domain adaptation scenarios. The code is available at https://github.com/hyeongyu-kim/Buffer_TTA.
AdverX-Ray: Ensuring X-Ray Integrity Through Frequency-Sensitive Adversarial VAEs
Caetano, Francisco, Viviers, Christiaan, Filatova, Lena, de With, Peter H. N., van der Sommen, Fons
Ensuring the quality and integrity of medical images is crucial for maintaining diagnostic accuracy in deep learning-based Computer-Aided Diagnosis and Computer-Aided Detection (CAD) systems. Covariate shifts are subtle variations in the data distribution caused by different imaging devices or settings and can severely degrade model performance, similar to the effects of adversarial attacks. Therefore, it is vital to have a lightweight and fast method to assess the quality of these images prior to using CAD models. AdverX-Ray addresses this need by serving as an image-quality assessment layer, designed to detect covariate shifts effectively. This Adversarial Variational Autoencoder prioritizes the discriminator's role, using the suboptimal outputs of the generator as negative samples to fine-tune the discriminator's ability to identify high-frequency artifacts. Images generated by adversarial networks often exhibit severe high-frequency artifacts, guiding the discriminator to focus excessively on these components. This makes the discriminator ideal for this approach. Trained on patches from X-ray images of specific machine models, AdverX-Ray can evaluate whether a scan matches the training distribution, or if a scan from the same machine is captured under different settings. Extensive comparisons with various OOD detection methods show that AdverX-Ray significantly outperforms existing techniques, achieving a 96.2% average AUROC using only 64 random patches from an X-ray. Its lightweight and fast architecture makes it suitable for real-time applications, enhancing the reliability of medical imaging systems. The code and pretrained models are publicly available.
- Europe > Netherlands > North Brabant > Eindhoven (0.04)
- North America > United States (0.04)
- Europe > Spain > Valencian Community (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)
Review for NeurIPS paper: Stochastic Normalization
Though the reviewers remark that the paper brings no insights/analysis, it was well-received by reviewers on average as an empirical architecture design idea, addressing an important problem. The experimental validation is conducted to the standards in the field and shows that the method is empirically useful. The combination BSS StochNorm is particularly promising. The authors are invited to submit the final version, considering the following improvements: - the paper can be densified to avoid self-repetitions and redundancy (of definitions of normalizations, descriptions of the contribution -- something like trice, of the existing methods, the algorithm and its description and Fig 1) - this space and the 9th page could be used to clarify important details of the experimental setup that are needed to understand what is the basis of comparison: how the hyperparameters are chosen per method, whether the 5 trials include a random train-validation splitting; include additional results from the rebuttal and discuss more along along the points below relating to the literature. As pointed out by reviewers, using moving averages is princily different from using batch statistics in that the moving average is considered as a constant for back-propagation.
DisCoPatch: Batch Statistics Are All You Need For OOD Detection, But Only If You Can Trust Them
Caetano, Francisco, Viviers, Christiaan, Zavala-Mondragón, Luis A., de With, Peter H. N., van der Sommen, Fons
Out-of-distribution (OOD) detection holds significant importance across many applications. While semantic and domain-shift OOD problems are well-studied, this work focuses on covariate shifts - subtle variations in the data distribution that can degrade machine learning performance. We hypothesize that detecting these subtle shifts can improve our understanding of in-distribution boundaries, ultimately improving OOD detection. In adversarial discriminators trained with Batch Normalization (BN), real and adversarial samples form distinct domains with unique batch statistics - a property we exploit for OOD detection. We introduce DisCoPatch, an unsupervised Adversarial Variational Autoencoder (VAE) framework that harnesses this mechanism. During inference, batches consist of patches from the same image, ensuring a consistent data distribution that allows the model to rely on batch statistics. DisCoPatch uses the VAE's suboptimal outputs (generated and reconstructed) as negative samples to train the discriminator, thereby improving its ability to delineate the boundary between in-distribution samples and covariate shifts. By tightening this boundary, DisCoPatch achieves state-of-the-art results in public OOD detection benchmarks. The proposed model not only excels in detecting covariate shifts, achieving 95.5% AUROC on ImageNet-1K(-C) but also outperforms all prior methods on public Near-OOD (95.0%) benchmarks. With a compact model size of 25MB, it achieves high OOD detection performance at notably lower latency than existing methods, making it an efficient and practical solution for real-world OOD detection applications. The code will be made publicly available
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Leveraging Normalization Layer in Adapters With Progressive Learning and Adaptive Distillation for Cross-Domain Few-Shot Learning
Yang, Yongjin, Kim, Taehyeon, Yun, Se-Young
Cross-domain few-shot learning presents a formidable challenge, as models must be trained on base classes and then tested on novel classes from various domains with only a few samples at hand. While prior approaches have primarily focused on parameter-efficient methods of using adapters, they often overlook two critical issues: shifts in batch statistics and noisy sample statistics arising from domain discrepancy variations. In this paper, we introduce a novel generic framework that leverages normalization layer in adapters with Progressive Learning and Adaptive Distillation (ProLAD), marking two principal contributions. First, our methodology utilizes two separate adapters: one devoid of a normalization layer, which is more effective for similar domains, and another embedded with a normalization layer, designed to leverage the batch statistics of the target domain, thus proving effective for dissimilar domains. Second, to address the pitfalls of noisy statistics, we deploy two strategies: a progressive training of the two adapters and an adaptive distillation technique derived from features determined by the model solely with the adapter devoid of a normalization layer. Through this adaptive distillation, our approach functions as a modulator, controlling the primary adapter for adaptation, based on each domain. Evaluations on standard cross-domain few-shot learning benchmarks confirm that our technique outperforms existing state-of-the-art methodologies.
Practical Batch Bayesian Sampling Algorithms for Online Adaptive Traffic Experimentation
Online controlled experiments have emerged as industry gold standard for assessing new web features. As new web algorithms proliferate, experimentation platform faces an increasing demand on the velocity of online experiments, which encourages adaptive traffic testing methods to speed up identifying best variant by efficiently allocating traffic. This paper proposed four Bayesian batch bandit algorithms (NB-TS, WB-TS, NB-TTTS, WB-TTTS) for eBay's experimentation platform, using summary batch statistics of a goal metric without incurring new engineering technical debts. The novel WB-TTTS, in particular, demonstrates as an efficient, trustworthy and robust alternative to fixed horizon A/B testing. Another novel contribution is to bring trustworthiness of best arm identification algorithms into evaluation criterion and highlight the existence of severe false positive inflation with equivalent best arms. To gain the trust of experimenters, experimentation platform must consider both efficiency and trustworthiness; However, to the best of authors' knowledge, trustworthiness as an important topic is rarely discussed. This paper shows that Bayesian bandits without neutral posterior reshaping, particularly naive Thompson sampling (NB-TS), are untrustworthy because they can always identify an arm as the best from equivalent best arms. To restore trustworthiness, a novel finding uncovers connections between convergence distribution of posterior optimal probabilities of equivalent best arms and neutral posterior reshaping, which controls false positives. Lastly, this paper presents lessons learned from eBay's experience, as well as thorough evaluations. We hope this work is useful to other industrial practitioners and inspires academic researchers interested in the trustworthiness of adaptive traffic experimentation.
- North America > United States > New York > New York County > New York City (0.14)
- North America > United States > California > Santa Clara County > San Jose (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- (6 more...)
Revisiting adapters with adversarial training
Rebuffi, Sylvestre-Alvise, Croce, Francesco, Gowal, Sven
While adversarial training is generally used as a defense mechanism, recent works show that it can also act as a regularizer. By co-training a neural network on clean and adversarial inputs, it is possible to improve classification accuracy on the clean, non-adversarial inputs. We demonstrate that, contrary to previous findings, it is not necessary to separate batch statistics when co-training on clean and adversarial inputs, and that it is sufficient to use adapters with few domain-specific parameters for each type of input. We establish that using the classification token of a Vision Transformer (ViT) as an adapter is enough to match the classification performance of dual normalization layers, while using significantly less additional parameters. First, we improve upon the top-1 accuracy of a non-adversarially trained ViT-B16 model by +1.12% on ImageNet (reaching 83.76% top-1 accuracy). Second, and more importantly, we show that training with adapters enables model soups through linear combinations of the clean and adversarial tokens. These model soups, which we call adversarial model soups, allow us to trade-off between clean and robust accuracy without sacrificing efficiency. Finally, we show that we can easily adapt the resulting models in the face of distribution shifts. Our ViT-B16 obtains top-1 accuracies on ImageNet variants that are on average +4.00% better than those obtained with Masked Autoencoders.
- North America > Canada > Ontario > Toronto (0.04)
- Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
- Asia > Middle East > Jordan (0.04)
BYOL tutorial: self-supervised learning on CIFAR images with code in Pytorch
After presenting SimCLR, a contrastive self-supervised learning framework, I decided to demonstrate another infamous method, called BYOL. Bootstrap Your Own Latent (BYOL), is a new algorithm for self-supervised learning of image representations. It does not explicitly use negative samples. Negative samples are images from the batch other than the positive pair. As a result, BYOL is claimed to require smaller batch sizes, which makes it an attractive choice.
Test-time Batch Statistics Calibration for Covariate Shift
You, Fuming, Li, Jingjing, Zhao, Zhou
Deep neural networks have a clear degradation when applying to the unseen environment due to the covariate shift. Conventional approaches like domain adaptation requires the pre-collected target data for iterative training, which is impractical in real-world applications. In this paper, we propose to adapt the deep models to the novel environment during inference. An previous solution is test time normalization, which substitutes the source statistics in BN layers with the target batch statistics. However, we show that test time normalization may potentially deteriorate the discriminative structures due to the mismatch between target batch statistics and source parameters. To this end, we present a general formulation α-BN to calibrate the batch statistics by mixing up the source and target statistics for both alleviating the domain shift and preserving the discriminative structures. Based on α-BN, we further present a novel loss function to form a unified test time adaptation framework Core, which performs the pairwise class correlation online optimization. Extensive experiments show that our approaches achieve the state-of-the-art performance on total twelve datasets from three topics, including model robustness to corruptions, domain generalization on image classification and semantic segmentation. Particularly, our α-BN improves 28.4% to 43.9% on GTA5 Cityscapes without any training, even outperforms the latest source-free domain adaptation method. Deep neural networks (DNNs) achieve impressive success across various applications, but heavily rely on the independent and identical distribution (i.i.d.) assumption. However, in real-world applications, the model is prone to encounter the novel instances. For examples, an automatic pilot should have robust performance under different weather conditions.
BYOL works even without batch statistics
Richemond, Pierre H., Grill, Jean-Bastien, Altché, Florent, Tallec, Corentin, Strub, Florian, Brock, Andrew, Smith, Samuel, De, Soham, Pascanu, Razvan, Piot, Bilal, Valko, Michal
Bootstrap Your Own Latent (BYOL) is a self-supervised learning approach for image representation. From an augmented view of an image, BYOL trains an online network to predict a target network representation of a different augmented view of the same image. Unlike contrastive methods, BYOL does not explicitly use a repulsion term built from negative pairs in its training objective. Yet, it avoids collapse to a trivial, constant representation. Thus, it has recently been hypothesized that batch normalization (BN) is critical to prevent collapse in BYOL. Indeed, BN flows gradients across batch elements, and could leak information about negative views in the batch, which could act as an implicit negative (contrastive) term. However, we experimentally show that replacing BN with a batch-independent normalization scheme (namely, a combination of group normalization and weight standardization) achieves performance comparable to vanilla BYOL ($73.9\%$ vs. $74.3\%$ top-1 accuracy under the linear evaluation protocol on ImageNet with ResNet-$50$). Our finding disproves the hypothesis that the use of batch statistics is a crucial ingredient for BYOL to learn useful representations.